1 Introduction and Motivation
Introduction to the Data Visualization with R
2 Introductions
2.0.1 Background
- Lead Data Scientist for Statistical Computing at the Urban Institute
- Adjunct Professor in the McCourt School of Public Policy at Georgetown University
- American Statistical Association Traveling Course Instructor
2.0.2 R Projects
- Synthetic data generation (rstudio::conf(2022) talk about
library(tidysynthesis)) - Formal privacy/differential privacy evaluation
- Projects that iterate with R Markdown/Quarto
- Manage the Urban Institute ggplot2 theme (Examples) (Code)
- Urban Institute R Users Group
3 Outline
3.0.1 Process
- Please consider turning on your cameras.
- Please ask questions at any time. You can speak up, raise your hand, or drop it in the chat.
- I need to know how you are doing. Please ask lots of questions and give your reactions.
- I will check in during breaks about pacing and content.
- We will skip some exercises. Don’t worry, I’ve shared solutions to all exercises!
3.0.2 Goals
- Enthusiasm
- Develop a firm foundation with R
- Leave with enough understanding and resources that you can apply the covered material to your own work
- You will still need to look stuff up!
- I will try to give you hints for where to find help
4 Questions for You
- What types of analyses do you develop?
- What is your programming experience?
- What are you most interested to learn?
5 Motivation
5.0.1 Code-First Data Analysis
I believe in a code-first approach to data analysis.
- Code maximizes the chance of catching mistakes when they inevitably happen.
- Code is the clearest way to document and share an analysis.
- Reproducible code creates a single source of truth. Any result can be mapped back to the code and data that created the result.
- Code allows for robust version control.
- Code can scale analyses to bigger data and bigger projects.
- R code is entirely free and open source.
- Code is the best way to gain access to new techniques and cool stuff.
6 Content
6.0.1 Core Content
- Introductions and Motivation
- Grammar of Graphics
- Jon Schwabish’s Five Guidelines for Better Data Visualizations
6.0.2 Optional Content
- Visualizing big data
- Visualizing regression models
- Data munging for visualization
- Visualizing time series data
7 Why Data Visualization?
- Data visualization is exploratory data analysis (EDA)
- Data visualization is diagnosis and validation
- Data visualization is communication
8 Why ggplot2
8.0.1 1. Looks good!
library(ggplot2) is used by fivethirtyeight, Financial Times, BBC, the Urban Institute, and more.
8.0.2 2. Flexible and expressive
By breaking data visualization into component parts, library(ggplot2) is a set of building blocks instead of a set of rigid cookie cutters.
8.0.3 3. Reproducible
8.0.4 4. Scalable
It’s almost as easy to make the 100th chart as it is to make the 2nd chart. This allows for iteration.
8.0.5 5. In my analysis workflow
Data visualization is fundamental to EDA, statistical modeling, and basically any work with data. Too many people find themselves using different tools for data visualization and statistical modeling. R/ggplot2 allows everything to happen in the same script at the same time.
Too often, switching from a programming language to Excel, results in parsing errors or cell-reference errors.
9 R Markdown
This short course will rely on R Markdown, which is a literate statistical programming framework that combines text and images, code, and code output into output documents like PDFs and web pages. It is like an easier-to-use LaTeX with more flexibility. Instead of .R scripts, we will use .Rmd scripts.
- Markdown
- YAML Header
- Code chunks
9.0.1 Running code in documents
We will mostly run code inside of .Rmd documents.
- Run the code like a .R script
- Run the entire current chunk
- Run all chunks above
9.0.2 Knitting documents
More commonly, documents are knitted. This runs all of the code in the .Rmd in a new R session and then creates an output document like a .html or a .pdf. If the code has errors, knitting will fail.
Click when a .Rmd document is open in RStudio to knit the document.
9.0.3 Exercise 1
Step 1: Open RStudio by double-clicking 2024_asa-data-viz.Rproj
Step 2: Open 02_workbook.Rmd in RStudio. Make sure it is in 2024_asa-data-viz.Rproj. That is, you should not see Project: (None) in the top right of RStudio.
Step 3: Click knit!